Learn the best practices for searching in hyphenated attributes that can have multiple formats, such as SKUs, ISBNs, phone numbers, and serial numbers.
1234-XYZ-B5
with any of these queries:
1234-XYZ-B5
1234XYZB5
1234XYZ-B5
1234-XYZ
XYZ-B5
-
), pound signs (#
), or plus signs (+
).
By default, Algolia doesn’t index non-alphanumeric characters, or “separators”, meaning they aren’t searchable. They’re, however, essential for tokenization.
For example, the string 1234-XYZ-B5
is tokenized as 1234
, -
, XYZ
, -
, B5
, because the hyphen (-
) is a separator, and all the other characters aren’t. Then, by default, Algolia only indexes the non-separator tokens 1234
, XYZ
, B5
. The same is true for the string 1234 XYZ B5
, since a space is also a separator. That’s why 1234-XYZ-B5
and 1234 XYZ B5
are functionally the same for the engine. Both would be a match, whether your user searches for 1234-XYZ-B5
or 1234 XYZ B5
.
1234-XYZ-B5
must be different than the results or 1234 XYZ B5
. This is very rarely the case for ISBN, SKU, phone number, or serial number use cases, but could be true for other use cases, such as when searching for the programming languages C
vs. C++
.
1234XYZB5
. While Algolia handles some splitting and concatenation, there are special considerations when numbers are involved, and Algolia may not handle several concatenations at once. For that reason, index all possible formats your users search with, not counting using different separators.
For example, this indexing format includes all different formats but only uses spaces and doesn’t include any versions with hyphens:
1234-XYZ-B5
1234 XYZ B5
1234XYZB5
1234XYZ-B5
1234XYZ B5
1234-XYZB5
1234 XYZB5
1234-XYZ
1234 XYZ
XYZ-B5
XYZ B5
["1234", "XYZ", "B5", "1234XYZB5", "1234XYZ", "XYZB5"]
, these are the only ones that need indexing.
That said, you may not want to undertake the work necessary to deduplicate redundant tokens. Additionally, having all variants allows for a more accurate proximity score, if your users tend to search with tokens in the same order as the original unformatted version. For example, while a user may search for 1234 XYZB5
, they probably won’t search for XYZB5 1234
.
attributesToRetrieve
.
searchableAttributes
searchableAttributes
. These attributes can be either strings or arrays.
searchableAttributes
.
searchableAttributes
.
disableTypoToleranceOnAttributes
.
disablePrefixOnAttributes
.
-
), plus (+
), and parentheses ((
, )
). Whether these characters are in the query or the index, Algolia won’t search for them.
This is by design: searching for +33
returns all records with an attribute starting with +33
or 33
, because the engine ignores the plus sign (+
). If you would like users to search with special characters, you must let the engine know to index these characters. You can do so with separatorsToIndex
.
For example, if you include +
in separatorsToIndex
, searching for +33
will only return records containing both +
and 33
. Since adding separatorsToIndex
can make a search more restrictive and complex, it’s generally not desirable to do so for these use cases.
Don’t include a character as a separatorsToIndex
unless its presence distinguishes between different records. For example, if searches for 1234-XYZ-B5
and 1234 XYZ B5
should return different results. This is rarely true for SKU, ISBN, phone numbers, and serial number use cases.
removeWordsIfNoResults
enabledremoveWordsIfNoResults
relaxes the query criteria when the engine initially doesn’t find any results.
Due to the special treatment of special characters,
this parameter might not work as expected when searching for hyphenated attributes.
Refer to the guide on using removeWordsIfNoResults
with non-alphanumeric characters for more information.